Skip to content

Conversation

@mgdenno
Copy link
Contributor

@mgdenno mgdenno commented Dec 23, 2025

This PR does a couple of things:

  1. Seems that a new version of pyspark (v4.1.0) came out that is not compatible with Sedona version we are using. Locking PySpark at 4.0 for now.
  2. It seems that the way the metric class was being added to the evaluation class was not correct and was calling it with the default parameters even when parameters were passed. What is puzzling is how it seems to work locally but not in teehr-hub local but does seem to work in remote teehr-hub. I don't like that I can't explain that.

@mgdenno
Copy link
Contributor Author

mgdenno commented Dec 23, 2025

@samland1116 if you update your project.garden.yaml to (line 13) it should build:

devTeehrVersion: f05f238883ba7109dc9218bc93e2639739495f7c

@mgdenno mgdenno requested a review from samlamont December 23, 2025 02:11
@mgdenno
Copy link
Contributor Author

mgdenno commented Jan 5, 2026

@samlamont I can't run the tests because the tests/data/v0_3_study_test.tar.gz seems to have your file path hard coded in it? Curious, @samland1116 are you able to run the tests?

@samlamont
Copy link
Collaborator

@mgdenno I see the issue with the tests. Earlier I changed setup_v0_3_study to just unpack a pre-created evaluation, instead of creating it in each test to save time. But the Iceberg metadata files contain absolute file paths so this won't work (it worked on my machine since I had the files locally in a temp drive). I'll roll back this change or think about a different solution

@mgdenno
Copy link
Contributor Author

mgdenno commented Jan 5, 2026

@samlamont I wonder if there is a setting to use relative paths or something. Either way, I guess it would be good to roll back for now.

@samlamont
Copy link
Collaborator

hmm well after reverting the change to the metrics class initialization the table_name is being recognized for me in my local kind deployment in WSL.

For example this works:

ev.metrics(table_name="fcst_joined_timeseries").query(
    group_by=["primary_location_id", "configuration_name", "variable_name", "unit_name"],
    include_metrics=[
        sm.Count(),
        sm.Average(),
        dm.RelativeBias(),
        dm.NashSutcliffeEfficiency(),
        dm.KlingGuptaEfficiency()
    ]
).to_sdf().show()

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants